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Abstract 

In this article we consider the volatility inference in the presence of both market microstructure 
noise and endogenous time. Estimators of the integrated volatility in such a setting are proposed, 
and their asymptotic properties are studied. Our proposed estimator is compared with the existing 
popular volatility estimators via numerical studies. The results show that our estimator can have 
substantially better performance when time endogeneity exists. 
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1. Introduction 

In recent years there has been growing interest in the inference for asset price volatilities based 
on high-frequency financial data. Suppose that the latent log price X = (Xt) follows an ltd process 

dX t =n t dt + a t dW t , for t G [0, 1], (1) 

where W is a standard Brownian motion, and the drift (fit) and volatility (at) are both stochas- 
tic processes. Econometric interests are usually in the inference for the integrated volatility, i.e., 
quadratic variation, of the log price process 

t 



(X,X) t = [ a 2 s ds 
Jo 



A classical estimator from probability theory (see, for example, Jacod and Protter (1998), Barndorff- 



Nielsen and Shephard (2002)) for this quantity is the realized volatility (RV) based on the discrete 
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time observations 



X u for = t < ti < . . . < t Nl = 1, 



where tj's may be a sequence of stopping times. The RV [X, X] t is defined as the sum of squared log 
returns 

[X,X] t = ^(AX u f, 



where AX 



X, 



X, 



_ x for i > 1. Under mild conditions, when the observation frequency N\ 
goes to infinity, [X, X\% — > (X,X)t- Furthermore, when the observation times (£j)j>o are inde- 
pendent of X, a complete asymptotic theory for the estimator [X, X]t is available, which says that 
y/Ni([X, X]t — (X, X)t) is asymptotically a mixture of normal whose mixture component is the vari- 
ance equal to 2 J** dH s , where Ht is the "quadratic variation of time" process provided that the 
following limit exists (see Mykland and Zhang (2006] ) or Mykland and Zhang (2012)) 



plim 



Nx- 



H, 



where "plim" stands for limit in probability. The quantity J * erf dH s can be consistently estimated 
by the quarticity Aq/3 • [X,X,X,X] t := Aq/3 • Et l <t( AX u)' 1 ■ 

The above provides a foundation for estimating the integrated volatility based on high frequency 
data. However, when it comes to the practical side, the assumptions for RV are often violated. Two 
aspects are of great importance. They are 

(a) Market microstructure noise; and 

(b) Endogeneity in the price sampling times. 

For the first issue, recently there has seen a large literature on estimating quantities of interest with 
prices observed with microstructure noise. One commonly used assumption is that the noises are 
additive and one observes 



Yt, 



X u +e u , for i = 0,1, 



(2) 



It is often assumed that the noise (etjj>i is an independent sequence of white noise and the sampling 
times (tj)j>i are independent of X. Various estimators of integrated volatility have been proposed. 



See, for example, two scales realized volatility of Zhang, Mykland and Ait-Sahalia (2005), multi-scale 
realized volatility by Zhang (2006[ ) , realized kernels of Barndorff-Nielsen et al. (2008[ ) , pre-averaging 



method by Jacod et al. (2009) and QMLE method by Xiu (2010). Related works include A'it-Sahalia, 



Mykland and Zhang (2005| ), |Bandi and Russell (2006] ), |Fan and Wang (2007[ ), |Hansen and Lunde 
(2004a| ), |Kalnina and Linton (2008) ), |Li and Mykland (2007[ ), |Phillips and Yu (2007[ ) among others. 

In contrast, issue (b) has only recently been brought to researchers' attention. The case when 
the sampling times are irregular or random but (conditionally) independent of the price process has 



been studied by A'it-Sahalia and Mykland (2003), Duffie and Glynn (2004), iMeddahi, Renault and 



Werker (2006), Hayashi, Jacod and Yoshida (2011) among others. A recent work of Renault and 



Werker (2011) provides a detailed discussion on the issue of possible endogenous effect that stems 



from the price sampling times in a semi-parametric context. Li et al. (2009) further investigate the 
time endogeneity effect on volatility estimation in a nonparametric setting. Volatility estimation 



2 



in the presence of endogenous time in some special situations like when the observation times are 



hitting times has been studied in Fukasawa (2010a) and Fukasawa and Rosenbaum (2012), and in 
a general situation has also been studied in |Fukasawa (2010b ). In Li et al. (2009;), the analysis was 
carried out by considering the time endogeneity effect which is reflected by 

(3) 



phmy/Ni[X,X,X] t 

where y/Ni[X,X,X]t := y/N± ^.^(AXtJ 3 is the tricity. Interestingly, the literature usually ne- 
glects the important information one could draw from the quantity [X,X, X]t, which can be in- 



terpreted as a measure of the covariance between the price process and time as shown in Li et al 
|(2009[ ). |Li et al. (2009 ) also conducted empirical work that provides compelling evidence that the 
endogenous effect does exist in financial data, i.e., plimi/iVi [X, X, X\% ^ 0. 

Although individually each issue (a) or (b) has been studied in the literature, there is a lack 
of studies that take both the microstructure noise and time endogeneity effect into consideration. 
Robert and Rosenbaum (2012[ ) study the estimation of the integrated (co-)volatility for an interesting 



model where the observation times are triggered by exiting from certain "uncertainty zones", in 
which case both microstructure noise and time endogeneity may exist. In this paper, we consider 
the presence of both microstructure noise and time endogeneity in a general setting. 

The paper is organized as follows. The setup and assumptions are given in Section [2j The main 
results are given in Section [3j In Section |4j simulation studies are performed in which our proposed 
estimator is compared with several existing popular estimators. Section [5] concludes. The proofs 
(except that of Proposition [T] below) are given in the Appendix; the proof of Proposition [I] is given 



in the supplementary article Li, Zhang and Zheng (2013). 



2. Setup and assumptions 

Assumption 1. We assume the setting of and |Ip and that there is a filtration (J-t)t>o, with 
respect to which W, /i and a in |ip are adapted and (t«)i>i are (J- 1)- stopping times. Furthermore, 
the filtration (J-t) is generated by finitely many continuous martingales. 

In the Introduction, we adopted the notation N\ for the number of observed prices over time 
interval [0, 1]. Here, we generalize this and denote 

Nt = max{i : t , < t}. 

In developing limiting results, one should be able to rely on some index variable approaching in- 
finity/zero. In our context, we assume that maxj Aij A is driven by some underlying force, for 
instance, n — > oo, where n (non-random) characterizes the sampling frequency over time interval 
[0,1]. 

We aim at effectively estimating (X, X)t based on our general setup. A local averaging approach 
is adopted. We consider the time endogeneity on the sub-grid level. Take the single sub-grid case 
for illustration, the sub-sample S = So := {t p ,t p+q , . . . ,t p+ i q , . . .} is constructed by choosing every 
qth observation (starting from the pth observation) from the complete grid. Here p is the number 



3 



of observations that we take in constructing local average, q is the size of blocks, and both are 
non-random numbers just as n. Define 



which satisfies that iq < n, and as p shall be taken as o(n), tqjn — > 1 as n — > oo. As n measures the 
sampling frequency of the complete grid, t measures that of the sub-grid S. Moreover, for notational 
ease, for k = 0, 1, . . . , q — 1, we define 

tij '■= kq+p-j+k, fori = 0,1, 2,..., and j = 0,1,2,..., p- 1, (5) 

and let 

Analogous to (|3), we consider the quantity \fl\X, X, X]f = ViJ2t t <t(-^U o ~ ) 3 ' The su- 
perscript S indicates the calculation being performed is based on the designated sub-grid. This 
convention applies to other sub-grids. Moving the sub-grid S one step forward forms sub-grid Si, 
continuing this process gives sub-grid 52 and so on till the (q — l)th sub-grid <S 9 _i. Figure [l] provides 
a graphical demonstration of our grid allocation. Further, on the sub-grid S, we define the number 
of observations up to time t in sub-grid S as 

Lt := max{i : U t o < t}. 

Naturally, L\ and N\ satisfy L\ < N\/q. 



S 




Figure 1: Grid allocation for Local Averaging. 



3. Main results 

We start with results based on a single sub-grid and then proceed to the multiple sub-grids case. 



3.1. Single sub-grid: Local Averaging 

A natural and effective way of reducing the effect of microstructure noise in estimating (X, X)t 



is averaging, see, e.g., Jacod et al. (2009) and Podolskij and Vetter (2009). Following Jacod et al 



(2009 ), we average every p observations that precede each observation in the sub-sample S to obtain a 
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new sequence of observations, which we denote by (Y ti0 )i>o- Based on this sequence of observations, 
we obtain a single-grid biased local averaging estimator. To be specific, 

1 p- 1 

F *i,o = -E y *«' for» = 0,l,2,.... 

1 3=0 

The RV based on the Y sequence is denoted by 

[Y,Y]f= ^(AF tM) ) 2 , 

where AY ti = Y ti — Y t ^ 1 for i > 1. After correcting the bias due to noise, the single-grid local 
averaging estimator is defined as 

{XX\ A := [¥,¥}? - for t G [0, 1], (6) 

where 

oi = [Y,Y} 1 /2N 1 , (7) 

is an estimator of ex 2 , see Lemma [l| in | Appendix AT , and [Y, Y\\ is the RV based on all observations 
up to time 1. We now state conditions that lead to the theorem for the single sub-grid case: 

C(l). nt and of > c > are integrable and locally bounded; 
C(2). n/Nt = O p (l); 

C(3). A n := maxi^j^^ |ij — U-±\ = Op^/n 1 ^ 11 ) for some nonnegative constant ry; 
C(4). L t /l £r s ds in D[0, 1], where r s is an adapted integrable process (and hence in particular, 
N 1 /n = O p (l)); 

C(5). The microstructure noise sequence (etJj>o consists of independent random variables with 
mean 0, variance <7 2 , and common finite third and forth moments, and is independent of 

The following theorem characterizes the asymptotic property of the estimator ([6]) . 

Theorem 1. Assume Assumption^ and conditions C(l)^ C(5). Suppose that r] G [0,1/6), and 
I ~ C(n a and p ~ C p n a for some < a < 2(1 — n)/5 and positive constants Ci and C p , and also 
that 

£[X, X, X, X]f [ u s a 4 s ds for every t G [0, 1], and (8) 
Jo 



y/i[X,X,X\f -A f v s a 3 s ds for every t G [0, 1], 
Jo 



(9) 



where [X, X, X, X)f = Y^t i0 <t(Xt i0 — X tl _ 1Q ) A , andu s a^ and v^ag are both integrable. Then, stably 
in law, 

2 



n ( - — — - LA 

Vi((x,x) t - (x,x)t 



t 

v s a s dX s + 



o 



asymptotic bias 







2 4v 2 s \ 4 in (Ci 2 \ 2 C t o 



9 r* v<vv c v b 



1/2 

dB« 



5 



where Bt is a standard Brownian motion independent of T\. 



Proof of the theorem is given in Appendix A. 2 



In the literature it is often assumed that the mesh A n = O p (l/n), in other words, rj = in 
Condition C(4). In this case, the convergence rate in Theorem [l] can be arbitrarily close to n 1 / 5 . 

Remark 1. Unlike in the full grid setting where a nonzero limit of tricity can be easily generated by 
letting the sampling times be hitting times of asymmetric barriers (see for instance Examples 4 & 5 of 
Li et al. (200Sty ), in the subgrid case a nonzero limit of tricity is far less common, and in particular 



unc 



der the settings of both Examples 4 & 5 of Li et al. (200ty , the limit in ^ vanishes. However 



as we found in simulation studies (not all reported), even in these situations, adopting the (finite 



sample) bias correction discussed in Section 3.3 below can substantially reduce the (finite sample) 



bias. Similar remark applies to the estimator in Theorem^ below. 

3.2. Multiple sub-grids: Moving Average 

We show in this subsection that for any e > 0, rate n 1//4_e consistency can be achieved by using 
moving average based on multiple sub-grids. For that purpose, we need such notations as [Y, Y]f k , 
i.e. the RV of locally averaged Y process over the kth sub-grid, for the same operations that are 
performed over the 0th sub-grid S = So being adjusted to the kth sub-grid Sk- To be specific, we 
take [Y, Y] t k for example; other notation with superscript k or has similar interpretation. Similar 
to the definition of [Y,Y]f (i.e. [Y,Y]f°), we first define 

p-i 



Y tf : =^£*W-,. fori = 0,1,2,... 



i,0 p ^ 

where, recall that t^ = tjq_|_ p+ fe denotes the ith observation time on the kth. sub-grid. The RV of 
locally averaged Y process over the /cth sub-grid is defined as follows 



[Y,Y]f* ■.= £(Ay t , o ; 



where AY.k = Y.k —Y.k for i > 1. Assume the following conditions that lead to the asymptotic 

h,o h,o h-i,o ~ 

result on multiple sub-grids: 

C(6). £E k <t (E£l q -f-^X ti _^ (AX ti ) 2 f*w s a 4 s ds for every t G [0,1], where w s at is inte- 
grable; 

c ( 7 )- \ ELo X ' X ]t k So Vso-I ds for every t E [0, 1], where v%oj is integrable. 

Define 

/„"2 



Under the conditions of Theorem [2] below, A(p,q) ~ — n Aa 2 CiC p /3. 
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Theorem 2. Assume Assumption 1 and conditions C(l) to C(7). Suppose that n G [0,1/9), and 
£ ~ Cin a and p ~ C p n 3a_1 for some max(4r/, 1/3) < a < (1 — n)/2 and positive constants Ct and 
C p . Then, stably in law, 

g-i 



yTi ( \ Y]f k - ^af - (1 + A(p, q))(X, X) t 



k=0 



pq 



- [ v s a s dX s + [ 
Jo Jo 



1/2 



dB s , 



asymptotic bias 

where Bt is a standard Brownian motion that is independent of T\. 



Proof of the theorem is given in Appendix A.3| 

If one assumes that A n = O p (l/n), then r\ = 0, and the convergence rate in the above theorem 
can be arbitrarily close to n 1 / 4 . 

Remark 2. If times are exogenous, Condition C(6) can be reduced to a similar assumption as (48) 



on p. 1401 of Zhang, Mykland and Ai't-Sahalia (2005). The limit is then related to quarticity and can 



be consistently estimated, see, e.g., Jacod et al. (200lfy , Barndorf) "-Nielsen et al. (2008]) . In general, 
when observation times can be endogenous, the limit is expected to be different. 



3.3. Bias Correction 

Since the estimator constructed based on multiple grids achieves a better rate of convergence, 
below we shall mainly focus on the moving average setting. Based on the above result, we have the 
following (infeasible) unbiased estimator: 



VI(l + A(p,q)) 

The following Corollary describes the asymptotic property for this estimator. 
Corollary 1. Under the assumptions of Theorem 2, stably in law, 



4w« 



9 



8Ci 



Cm 



1/2 



dB s , 



sn({x^)T -<*,*>i) 

where Bt is a standard Brownian motion that is independent of T\ ■ 

Proof. This is just a rearrangement of the convergence in Theorem 2. ■ 

— ■ — - (o) 

To improve over (X,X) 1 and build a feasible unbiased estimator, a consistent estimator for the 
bias term 2/3 f v s a s dX s is needed. This is the issue that we deal with next. Define 

ana /W(t): ^ 



V£(l + A(p,q)) 



t • 



(10) 



g-l 



fc=0 



7 



For a given partition (Tj)j>o over [0, 1], we define 

f®(t) := (F^(n) - Fjf>(Ti-i))/(Ti - n-t), for t G h,r m ), for j = 2,3. 
We then have that stably in law, 

2 



(11) 



Vi{F^(t)-(X,X) t 
Define 



v s a s dX s + 



4_ 2 \ 4 8C| 



2\2 



1/2 



7(a, ??) := min{-2a + 1 - 3rj/2; a/2 - rj/2; 7a/2 - 3/2; 3a/2 - 1/2 - rj; 5a/2 - 1 - rj/2}. 
And assume 

c ( 7 ') J EEo ^I X > X > X ]f ft ~ Jo v s (?sds /5 n -A in D[0, 1] for a (nonrandom) sequence (<5 n )n>i 

with 5 n -»■ and l/$„ = o (rt^)) . 
We have the following 

Proposition 1. Assume the conditions of Theorem 2, C(7') and 3/7 < a < (2 — 3ry)/4 wii/i 77 € 
[0,2/21). Suppose is a.s. continuous and bounded on [0,1] for j = 2,3. Moreover, define a 

partition [r^Ti+i] := [*idig)*(i+i)rfig] which is a block of d±q time intervals over the complete grid with 
1/di = o (l/n 1-2 "), maxj |tj — T£_i| = o p (l) and S n / min^ |t$ — Tj_i| = O p (l); and let 



af t . :=-Vy t , , ,--Vy t , „. 

y 3=0 y j=0 



i+p-j 



Then 



E C^ AF Ti -A f v s a s dX s in D[0, 1]. 



Proof of Proposition [I] is given in the supplementary article Li, Zhang and Zheng (2013). 
According to the above proposition, a consistent estimator for the bias 2/3 v s a s dX s is given 



by 



°- 3 ^ f( 2 )( T . 



Finally, we define our feasible unbiased estimator as 

l! ^(1 + A(p,g)) 

The following theorem gives the CLT for our final estimator. 

Theorem 3. Under the assumptions of Theorem 2 and Proposition 1, stably in law, 



sTi ((x^x) 1 -(x,x) 1 ) 



4m, 



9 



8C[ 
C it 



2\2 



1/2 



dB, 



where B is a standard Brownian motion independent of T\ . 
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4. Simulation studies 



In this section, we conduct simulation studies. We investigate the performance of our proposed es- 
timator {X, X) l compared with existing popular estimators in both endogenous and non-endogenous 
cases. We shall use two data generating mechanisms for X: (1) a constant volatility Brownian 
bridge; and (2) a stochastic volatility Heston bridge. In each case, we start the latent process X at 
Xq = log(5), let the standard deviation of the noise be a £ := (cr 2 ) 1//2 = 0.0005 and simulate 1,000 
sample paths for observed price process Y. 



4-1- Estimators used for comparison 

Below we briefly recall four commonly used volatility estimators: the two scales realized volatility 
(TSRV) of |Zhang, Mykland and Ait-Sahalia (2005| ), the multi-scale realized volatility (MSRV) of 



Zhang (2006), the Realized Kernel estimator of Barndorff-Nielsen et al. (2008), and the Pre-averaging 



estimator of Jacod et al. (2009). 

The (small-sample adjusted) TSRV estimator is given by 

(Xx)'r - (i- ( f Erf - • 

where the data is divided into K tsrv non-overlapping sub-grids and [Y, 1"] ^ is the RV on the kth 
sub-grid. Zhang, Mykland and Ait-Sahalia (2005) provided a guideline on the choice of the grid 
allocation. If we pretend that the volatility were constant, then the optimal choice for grid allocation 



is Ktsrv = ctsrvN^ 3 , where, in practice, one can set ct s 



i2([y,y] 1 /(27v 1 ))- 
([Y,y]f rf 



2\ 1/3 



where [Y,Y]l ub is 



the RV based on sparse sampling. Here, we implement [Y, Y]f ub at 5 minutes frequency. 
The MSRV estimator, which is a rate-optimal extension to TSRV, is given as follows 



Kmsrv -I 3 
■— msrv r — - 1 x — - (h \ 

= £ A rfE[ y ' y ]! } ' 

.7 = 1 J k=l 



where Ai = ai + {{N x + l)/2) -1 , A 2 = a 2 - ((iVi + l)/2)" 1 and A f = at for i > 3 with a 
h{i j ' K msTV )i j 1 K msrv h (i / K msrv )i / (1K msrv ) , for i — 1, . . . , K msrv , where K msri 
h{x) = 12x — 6. The optimal choice of c msrv when the volatility is constant is 



Cmsrv^i and 



'T 3 + T 4 + ((T 3 + T 4 ) 2 + 12Tir 2 ) 
2T 2 



1/2 \ V2 



where T x = 48([V, V]i/(2AM) 2 , T 2 = 52([V, V] 1 /(2A^ 1 )) 2 /35, T 3 = 24([V, V] 1 /(2A^ 1 )) 2 /5 and T 4 
48[V,V]f 6 ([y,y]i/(2Af 1 ))/5. 

The Realized Kernel estimator is defined as 

0TX)* ^ = [Y,Y) 1+ J2 fk((h - l)/H) [ £ {AY u AY tt _ h + AY ti AY ti+h ) 



h=l 



i=l 



9 



1/2 

where H = c ker N^ and f k is a kernel function. We choose the Parzen kernel: 



l-6x 2 + 6x 3 for < s < 1/2; 
2(1 -x) 3 forl/2<a:<l. 



fk(%) = 

Under constant volatility, the optimal choice for c ker in practice is given by 



Ck - = { ) [if r /fc + y {fk ) + 3fk {fk (0) + /; 

where f°'° = J f k (x) 2 dx, /°' 2 = Jq 1 f k (x)f' k \x)dx and /°' 4 = J Q f k (x)f%'(x)dx. 
The Pre-averaging estimator is as follows: 

9(p 2 V N i ~^ 29 z ip 2 N 1 

where cpi = 1, if2 = 1/12 and 

1 / fcn-i fcn/2-l 

Ay s = — ^ y+j - ^ 

™ \j=k n /2 j=0 

with A: n = \/N\9. The optimal choice of when the volatility is constant is 

9 = A.777([Y,Y] 1 /(2N 1 )) 1 / 2 /([Y,Y} s 1 ub ) 1 / 2 . 

Remark 3. The grid allocation schemes in constructing the above estimators are optimal in the 
sense of achieving efficient asymptotic variance bound when (at) is constant. However, in practice 
there is no optimal choice since, for instance, (at) is random and time dependent. See Remarks 2 and 
3 in Jacod et al. (20(Hfy for related discussions on this. In our case, due to the more complex model 



assumptions, i.e. data with time endogeneity and noise, and grid allocation scheme, i.e. bivariate 
setting (p, q) in contrast to the existing univariate cases, we do not provide a theoretical optimal 
choice but rather give below some practical guidelines. 

Back to our estimator (X,X) 1 , there are several tuning parameters (n,£,p,q and d\) that one 
has to determine. Regarding n which characterizes the sampling frequency, one can use the average 
number of transactions per day for the past, say 30, days as an approximation. About (£,p, q), notice 
that Theorem [2] suggests £ ~ Cen a (hence q ~ n/£) and p ~ C p n 3a-1 . On the one hand, one should 
choose £ as large as possible in order to have higher convergence rate. On the other hand, large £ 
induces small q and hence small p (recall q > p) and the main role that p plays is to reduce the 
microstructure noise. Hence, one should also be aware of the magnitude of the microstructure noise 
when choosing appropriate p, and p can not be too small when prices are heavily contaminated. 
Under the simulation setting below, the sampling frequency is around n = 46, 800, and the standard 
deviation of the noise is a e = 0.0005. We choose p = 5 which is found to be good enough to reduce 
the microstructure noise effect. In practice, one can use Q to estimate the standard deviation of 
the noise and come up with a reasonable choice of p. The block size q should be larger than p and 
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is chosen as 20 (and i ~ 2,340). As to d\, this depends on, for example, how volatile the volatility 
process is, which one can get some rough idea by looking at a suitable estimate of the spot volatilities. 
If the volatility process is more volatile, one should divide the whole time interval into shorter time 
periods, i.e., choose a smaller d\. In our simulation, we choose d\ = 100, i.e. dividing the complete 
grid into around 20 blocks. 

We next present our three simulation designs and the corresponding results. 

4-2. Design I: Brownian bridge with hitting times 

We first consider the case when the latent price process X follows a Brownian bridge with 
(constant) volatility a that starts at Xq and ends at Xq + 4cr. X can be expressed as (see pp.358 of 



Karatzas and Shreve (1991)) 



X + 4a-X t 

dX t = dt + a dW t , 

1-t 

where Wt is a standard Brownian motion. In this study, we set a = 0.02. The sampling times are 
generated as follows: let a = 5a, b = a/10, n = 46, 800, £' rj 16800 (roughly n 19 / 21 ), and q' = [n/l'\. 
Then 

(1) For j = 0,1,2,..., q', tj = ±; 

(2) Fori = 1,2,..., 

Sparse sampling: ti q > + i = inf{t > : X t — Xt. , = either a/VF or — b/V¥}; 

Intensive sampling: t iql+j = t iq > +1 + for j = 2, . . . , q' . 

The mean observation duration when sampling sparsely is about l/{2£'), roughly 3 times of the 
observation duration when sampling intensively. If as n — > oo, £' grows in the rate of n 19//21 , then 
actually the limit in C(7) vanishes, however, as one can see from the simulation results below, (finite 



sample) bias correction as discussed in Subsection 3.3 can substantially reduce the (finite sample) 
bias. 

Figure 2] displays the histogram and normal Q-Q plot for the estimator (X,X) 1 based on the 
1,000 simulated samples. The plots show that the finite sample behavior of our CLT works well. In 
Table [T] we compare the performances of the four estimators that we discussed in Section 4.1, the 
"Uncorrected" estimator F^\l) defined in (10), and our final estimator {X,X) V From the table 



one can see that our estimator provides the smallest RMSE and has substantially smaller bias than 
the others (reduced by more than 80%) while maintains similar efficiency (standard deviation). 
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0.00035 



0.00040 



0.00045 



-3-2-10 1 2 3 

Theoretical Quantiles 



Figure 2: Histogram and QQ plot of the estimator (X,X) 1 for Design I. The red vertical line in the histogram indicates 
the true value of target. 



Table 1: Performance of the six estimators in the presence of endogenous time for Design I, the constant volatility case. 
Our estimator {X, X) 1 provides the smallest RMSE. The RMSE is reduced by more than 50%; the bias is reduced by 
more than 80% while the standard deviation is kept at the same level as others. 





TSRV 


MSRV 


Kernel 


Pre-averaging 


Uncorrected 


(X,X), 


RMSE 
sample bias 
sample s.d. 


3.734e-05 
3.300e-05 
1.748e-05 


3.553e-05 
3.163e-05 
1.619e-05 


3.810e-05 
3.454e-05 
1.609e-05 


3.340e-05 
2.927e-05 
1.609e-05 


3.300e-05 
2.911e-05 
1.555e-05 


1.621e-05 
-4.997e-06 
1.543e-05 



4-3. Design II: Heston Bridge with hitting times 

In order to further investigate the performance of our estimator under more complex situations, 
in this subsection, we consider the following stochastic volatility model 

X + 4tfV2 - X t 



dX t 



dt+JV t dW t 



1-t 

dV t = k(# - V t ) dt + dW t a , 

where Wt and Wf are standard Brownian motions with instantaneous correlation coefficient p, and 
K, $ and 7 are positive constants. We consider the situation when X starts at Xq and ends at 
X + M 1 / 2 . In the simulation, we set •& = 0.0004, 7 = 0.5/252, k = 5/252 and p = -0.5. Here, 
we choose a moderate value —0.5 for p to represent the leverage effect. The leverage effect can be 



bigger for indices as studied by A'it-Sahalia and Kimmel (2007) and Ai't-Sahalia et al. (2012). Times 



are generated according to the same hitting rule as in Design I. We can see from Table [2] that m 
this more complex situation, our estimator again has substantially smaller bias and RMSE than the 
others. We did not include the sample standard deviation here since the integrated volatility to be 
estimated in this case depends on the sample path and is random. 
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Table 2: Performance of the six estimators in the presence of endogenous time for Design II, the stochastic volatility 
case. Our estimator again provides the smallest RMSE. The RMSE is reduced by more than 50%; the bias is reduced 
by more than 80%. 





TSRV 


MSRV 


Kernel 


Pre-averaging 


Uncorrected 




RMSE 


3.824e-05 


3.579e-05 


3.835e-05 


3.387e-05 


3.375e-05 


1.636e-05 


sample bias 


3.393e-05 


3.175e-05 


3.463e-05 


2.965e-05 


2.974e-05 


-4.215c-06 



4-4- Design III: Brownian Bridge with independent Poisson times 

The goal of this design is to check the performance of our estimator when the sampling times 
are not endogenous. We again assume the Brownian bridge dynamic for X as in Design I. The 
observation times are now generated from an independent Poisson process with rate 46,800. Table [| 
reports the result of performance comparison, and we can see that our estimator performs similarly 
as the other estimators in this case. 

Table 3: Performance of the six estimators when the observation times are not endogenous. The performance of our 
estimator is comparable to others. 





TSRV 


MSRV 


Kernel 


Pre-averaging 


Uncorrected 


0^X)x 


RMSE 
sample bias 
sample s.d. 


1.486e-05 
2.643e-06 
1.463e-05 


1.375e-05 
1.584e-06 
1.367e-05 


1.434e-05 
4.144e-06 
1.374e-05 


1.373e-05 
-2.847e-07 
1.373e-05 


1.312e-05 
-1.274e-06 
1.307e-05 


1.568e-05 
-7.723e-06 
1.365e-05 



In summary, one observes from Tables 1-3 that when sampling times are endogenous (Designs I 
and II), one can have substantial reductions in RMSE and bias by using our estimator. When there 
is no endogeneity (Design III), our estimator performs comparably to others. 

5. Concluding remarks 

In this paper, we establish a theoretical framework for dealing with effects of both the endogenous 
time and microstructure noise in volatility inference. An estimator that can accommodate both 
issues is proposed. Numerical studies are performed. The results show that our proposed estimator 
can substantially outperform existing popular estimators when time endogeneity exists, while has a 
comparable performance to others when there is no endogeneity. 
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Appendix A. Proofs 



Throughout the proofs, C, c, C\, etc. denote generic constants whose values may change from line 
to line. Moreover, since we shall establish stable convergence, by a change of measure argument (see 



e.g. Proposition 1 of Mykland and Zhang (2012)) we can suppress the drift and assume that 
1. nt = o. 

, \ S I 1 "> i \ 1 i i i' • • 1 1 1 ". / l 1 1 r "I / ] i i !"1 1 "I 1 \. V 1 / - / l 1 "1 /"] 1 { I, ill All r- J 



Moreover, because of the local boundedness condition on a 2 , by standard localization arguments we 



can assume without loss of generality that 

2. < c < at < 0"+, where c and 0"+ are nonrandom numbers, 
see e.g. 



Mykland and Zhang (2009) and Mykland and Zhang (2012). Similarly, we can without loss 



of generality strengthen the assumption on A n and N\ in C(2) - C(4) as follows: 

3. A n < C/n 1 -'?; and 

4. n/C <Ni< Cn. 

Appendix A.l. Prerequisites 

In the proofs, we shall repeatedly use the following inequalities. 
Burholder-D avis- Gundy (BDG) inequality with random times: 

First, if ti's are stopping times and f(s) is adapted with maxo< s <i |/(s)| < /+, then by the Burholder- 
Davis-Gundy inequality with random times (see, e.g., p. 161 of |Revuz and Yor (19 99)), for any 
exponent (3 > 1, 

\ 0/2 



E 



( T f(s)dW s ) <Ce( t f(s) 2 ds) KCf+EiU-t^fl 2 . 
\Jti-i ) \Jti-\ J 



Doob's LP inequality: 

Second, for any process Z, which is either a continuous time martingale or a positive submartingale, 



Doob's LP inequality (see p. 54 of Revuz and Yor (1999)) states that, for any (3 > 1 and any A > 



P 



sup \Z S \ > A 
se[o,i] 



and for j3 > 1, 









• 


sup Z s 


*) 




L se[o,i] 





1/0 



< 



A0 



(3-1 



E\Zi\ 



i/0 



Therefore, if we can establish a bound order for E\Z\\^ {(3 = 1 or 2 in our case), then the same 
bound order applies in D [0, 1]. 

We will also use the following results about the convergence of erf to cr 2 . 
Lemma 1. For erf defined in 



'?), one has y/N[ [a 2 e - a 2 ) = O p (l). 
Proof. First, notice that 

y/Ih(^-oi) = [X,X} 1 /2^N~ 1 + [X,e} 1 /^N~ 1 + ([e,e] 1 -2N 1 a 2 )/2^/N~ 1 . 
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By C(2) and the fact that [X,X]i = O p (l), 

[X,X] 1 /2 V r N~ 1 = O p 
As to [X,e]i/\/Ni, we treat it as follows, 



*i<i 



t;<l 



We have 



E 



1 *;<1 



2 




) 


^1 



by again C(2) and [X, X]i = O p (l). The same argument applies to the other term. Hence, 
[X,E\i/^Ni = P {l/yjn). For the last term ([e,e]i - 2Nio 2 e ) /2yfN[, we rewrite it as 



([e,e] 1 -2N 1 a 



2vm vm ^ 

Similarly as above, we have 



*i<l 



4 + - 2<x £ 2 
2^i 



^E(4-^)j * =(l + ^)var( £ 2 ) = O p (l), 



and Et t <i £ *i-i £ ti = Opt 1 ) and ( £ t + £ t Nl ~ 2 °~s) / ( 2 VNi) = O p (l/y/n), completing the proof. 



Next, as we will deal with sums of a random number of random variables repeatedly, the following 
simple lemma turns out to be very useful. 

Lemma 2. Suppose that N is a random variable taking values in nonnegative integers, and X\,X%, . . . 
are nonnegative random variables satisfying 



Then 



E(XiI {i < N} ) <C-P{i< AO, for all i. 

N 

E^Xi <C-E(N). 



i=i 

Proof. The conclusion follows from the fact that YliLi = X^i ^i!{i<N} an d the Monotone 
Convergence Theorem. ■ 

Appendix A. 2. Proof of Theorem 1: single sub-grid case 
The basic idea is to decompose 



LA 



2L t 



(X, X) t - (X, X) t = [Y, Y]$ - -^-o-l - (X, X) t 
into existing familiar quantities and other negligible terms. The proof is divided into three steps. 
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Step 1: Introducing Y 

The local average can be decomposed as follows 

p-i 

V 



1 

~~ n 'y^X-^-Uq+v-i + £ Uq+p-j) 

^ P-l ^ P-l 



j=0 



p-l 

p 



J=2 ^ 



where 



x p-i 

^ : =-E £ * 



ig+p-j ' 



J"=0 



which is a sequence of independent random variables with common mean Ee = 0, variance Ee 2 = 
cr 2 /p, Ee 3 = Ee 3 /p 2 and Ee 4 = Ee 4 /p 3 + 3(p - l)(a 2 ) 2 /p 3 . Motivated by the above decomposition, 
we introduce the new process Y as follows 

Yti, = X U, + ^, > for « = 0, . . . , Li. 

The strategy is that if the difference ([Y, Y]f — [Y, Y]f ), where similarly to the definition of [Y, Y]f 

[Y,Y]f := ( A ^,«.) 2 ^d AY t . = Y Ufi - Y ti _ lfi , 

is of a negligible order, then one needs only to deal with [Y, Y]f . 

Step 2: Determining the order of([Y,Y]f - \Y,Y]f) 

For notational convenience, we define for A; = 0, 1, . . . , q — 1, 



B\ 



= e t k , and 



(A.l) 



tiq+j + k 1 



and let 



Ai = A$, Bi = B°, and d = C°. 
Adopting the above notation, we can write 

[F, Y]f - [Y, Y]f = ( AA i + AB i + AC *f ~ E + AB ^ 

= A cf + 2 E A ^ AC * + 2 E AB * AC * ■ 



(A.2) 
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III 
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By Cauchy-Schwartz inequality, for any t, 

ti t O<l U q <l 

By the BDG inequality and the strong markov property of X, 



E[Cfl {Uq <i}] = E (l {Uq <i}E [Cf\ F Uq \) < CE 



ht iq <i}E E^ / 

\ 1 = 1 ^ J Uq+j 



< CE 



l {U q <i} 



P- 1 -2 
* ( E i^An 



.3=1 



P 



By Lemma [2] and the fact that N\ < Cn and hence L\ < Cn/q we then obtain 



e(i) < ae e c ?) ^ 4E E c ?) ^ c p/^ n 



vti.O<l 



Next we study term JiT. In fact, 



E(III) 2 = AE 



E E A ^ A ^ 



v*ig<l 



' X *i,0<t 



Hence, it follows from (A.3) that 777 = O p ((l/^ra^)) 1 / 2 ). 



Finally we deal with term 77. 
Claim 1. // = 2 AAACi = O p (J^j + O p 

Proof of the Claim. First notice that 

E AAACi = E CiAAi - E C^tAAi 

tifi<t ti,0<t *i,0<* 



where, by BDG inequality and (A.3), we have that 



(A.3) 



(A.4) 



ti.n<t 



E [ E C^AJU ] <CEJ2 CUj^al < ClJ^ < C-^, 



(A.5) 



and hence E ti , < t Q-i A A = O p (y^J . Next define AlW := X tiq+1 - ^ (i _ 1)?+p . Then 



ti n<t 



k, <tj=2 



P 



(A.6) 



EE 

U,o<tj=2 



si 



1 j_1 

Ax(i) + -E(j+ m - 2 ) AX ^ 

^ m=2 



sqr+j 



S"2 
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By BDG inequality, E<,i < Cip/n 1 v ; moreover, by BDG inequality again, 



E 



j-1 
V 



1 i_1 

AJ« + -^(j+m-2)AI i 

" m=2 



iq-\-m 



< Cq/n 1 -* 1 , 



hence, applying once more the BDG inequality one obtains that 

ipq 



e&y < c- 

It follows that ?2 = O p ((p/n 1 ~ 2,? ) 1 / 2 ) and moreover 

tp 



2-21) • 



II = O r , 



+ O v 



P \ A/ n l-2 V 



(A.7) 



To summarize, 



[Y,Y]t-[Y,Y}t = O p {-^ +0 



P 



n 



1— 2t? / • 



(Ai 



Remark 4. In the proof for Theorem 2 below, we will analyze [Y ,Y]f — [Y,Y]f in more detail. 
Notice that 1 = 2 Y^ t <t @i ~ 2 Y^f <t ^i-lQ — — ^X 4 j u '^ ere the end effect terms Cq and C\ t 
are O v ((p/n 1 ^) 1 / 2 ) and by BDG inequality, J2 t . Q<t d-\Ci is O p [pi 1 / 2 /n 1 ~ r ') . Hence, from (A. 2) 
and the analysis of terms I, II and III , 



[y, y]f - [y, ?}f = 2 J2 IE hr AXt 



U,o<t \j=2 



p 



iq+j 



U,o<tj=2 



p 



n 



\-2r) j ' 



(A.9) 



Moreover, 



±^(AX Uq+] f + 2 ^ (t "f V AA-,,. (I( . (A.,,.) 



i=2 



2<k<j<p 



P 



LA 



Step 3: CLT for {X,X) t 
We first notice that 



[Y,Y]f = (A^, ) 2 + 2 E (A^, )(A £ - ti , ) + £ (A £ - tij0 ; 



(A.ll) 



ti.n<t 



ti.n<t 



[X,X]f + 2[X,e]f + [e,e] 
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Hence, we have the following decomposition 



LA 



(X, X) t - (X, X) t = [Y, Y]t - [Y, Y]t + [Y, Y]* - (X, X) t ^ „ ; 



2L t — 
at 



([Y,Y]f-[Y,Y]f) + ([X,X]f-(X,X) t ) 



\e,e\ 



2Lt 
P 



2Lt 
P 



a*-o*)+2[X,gf t 



(A.12) 



Recall that I ~ C t n a an d p ~ C p n a . Then by ( |A.8| ), [Y,Y]f - [Y, Y]f is o p (l/Vl). As to the term 
2L t /p{a1 — a 2 ) in (A.12), by Lemma [l] together with C(2) and C(4), we have that 



1_U 
p 



2L t 



pVNi 



P \Py/n 



1 

71 

LA 



in D[0,1}. 



Therefore, in order to prove the asymptotic property of \Tt ( (X,X) t — (X,X)t ) , one only 
needs to prove the FCLT for the following quantity 

2L, 



V£ {[X,X]f - (X,X) t ) + Vl ([e,e\f - ^a^j + 2y/i[X,i]f. (A.13) 



Firstly, notice that 



2L 



:, e]f - ^a 2 = 2 £ U s 2 0fi - e 2 ^ - 2 £ e u ^ e Ufi . 

^ 1=1 V ^ 7 8=1 



Note that £ 2 0Q = O p (l/p), hence Vie 2 = o p (l), and so is Vie 2 L Q . Moreover, 



Li 



[ X i £ ]f - y^X^ X tj,o - ^ X t i+li0 ) £ t it0 + ^ X t Lt +i,a £ tL t ,a - AX tlfi e tofi . 



1=1 



Note that AJ( Lt+1 e 4it = O p ((l/(pln , ?)) 1 / 2 ) and so is AX tl £t 00 - We are hence led to study the 
following martingales 



M,:=Ve([X,X]f-(X,X} t ) 
M<":=^g(4, -f), 



(3) 



i=l 



^E( A ^,o-AX ti+li0 )£ t , . 



Then (A.13) can be rewritten as 



Af t + 2M t {1) - 2M t (2) + 2M t {3) + o p (l). 



(A.14) 
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Simple calculation gives the corresponding predictable variation processes as follows 
(M (1 \M^) t Fi = £L t \av{e 2 ) = £L t (Ee 4 - (Ee 2 ) 2 ) 



Ee' , 2(a 2 ) 2 3(a 



+ 

2 r t 



'If 



Ce 2 



(M^,M(% 7 1 = ^^4_ 10 4 f^a 2 Y and 



o 



2 



P 



i=i 



-*j ((AX tlfi ) 2 + (AX tLt+u> ) 2 ) 



^2^(X,X) t a 2 , 

Up 



where in the last convergence we used the fact that £/p-J2ili(AX ti )(AX (i+1 ) — > in D[0, 1] since 
it is a martingale with predictable variation 

- 2 E(AX,, ) 2 / a 2 ds < £(AX,, ) 2 = P = o p (l). 



Furthermore, the predictable covariation processes of M^\M^ and are 



Li 



Ee 3 -^ 



i=i 



i=i 



,3/2 



Op(l), 



it 



<M« M®), ^ =^(AX t , - AX u+lfi )E(e*) 



i=i 



^( £ " 3 )(AX tli0 - AX ti(+10 ) = O p (y/e/(p*n-v)) = op(l), and 

Li 



i=i 



3/2 



p 



Op(l), 



where the last order follows from the fact that Yli=i(AX ti0 — AX ti+10 )£ tj _ 10 = O p (l/p 1 / 2 ) by 

(3) 

considering its predictable variation process similarly to the way that we treat M t v ' . The Lindcbcrg 
type condition can be easily verified by using the same calculations as above and the assumption 
that (etji>i is an independent sequence with finite forth moment. Therefore, the usual martingale 
central limit theorem gives 



/ Wi{t) \ 
W 2 (t) 

V w 3 (t) J 



(A.15) 
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where W\, W2 and W3 are independent standard Brownian motions and the limiting covariance 
matrix process is given by 

( 2(gaf) 2 /o*r.«fa 0^0 A 

{^if !>sds 

2^(X,X) t a 2 e J 



Finally, by Theorem 1 in Li et al. (2009), we have the following convergence for Mt 

4vV 



M, 



v s a s dX s + 



at dW(s), 



(A.16) 



3 Jo Jo V 3 9 

where W(s) is a standard Brownian motion. Furthermore, it is easy to see that (Af,AfW) t = for 



1,2,3, hence W(t) is independent of Wi(t),i = 1,2,3. Combining this fact with (A. 15) and 



(A.16) yields the desired convergence. 



Appendix A. 3. Proof of Theorem 2: multiple sub-grids case 
We shall establish the following stable in law convergence 



/1 4- 1 

\ q k=0 



2N, 



pq 



\ [ v s a s dX s + / 
Jo Jo 



al-(l + A(p,q))(X,X) t 



1/2 



dB,. 



Similar to the convention of using notation [Y, Y]f h to denote RV of local averaged Y process 
computed based on the kth. sub-grid Sk, all subsequent notations in the proof with superscript k 
or Sk indicate that the same operation as performed on the sub-grid S = So is applied to the kth 
sub-grid. 

The proof for Theorem 2 also proceeds in three steps. Similar to the proof of Theorem 1, the 
proof for Theorem 2 is based on the following decomposition 



a*-(l + A(p,q)){X,X) 



/1 I' 1 

\ q k=0 

/1 I- 1 1 I- 1 „ „ \ 

-M- Y,[ Y ^t - - E[ y < y ]f fc " Mp,q)(x,x) t 

\ q k=o q k=0 / 



I 



k=0 



k=0 



k=0 



II III 

Assuming £ ~ Cpn a and p ~ C p n 3o_1 with assumptions made in the theorem on a and 77, we shall 
show in Step 1 that / = o p (l); in Step 2 that // satisfies a martingale CLT; in Step 3 a CLT with 
asymptotic bias decomposition for term III] and, finally, sum up in Step 4 . 
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Step 1 

To show I = o p (l), we consider the difference 



^g[F,F]f.-^g[f,y] 



S k 
t 



k=0 



k=0 



9-1 



E ( AC t) 2 + 2 E AA i AC i + 2 E AB i AC ! 



k=0 yt <t 



Z i,0- Z 



Z i,0- Z 



(A.17) 



/ 



adopting the previous notational convention for the single sub- grid case, where A k , B k and C k are 
defined in (A.l). Roughly speaking, recall (A.9) and (A. 10) of Remark [4] from the end of Step 2 in 
the proof for Theorem 1, we expect the difference (A.17) to be 



E 



pi p 



]T(AA t J 2 + o p (l) 



= VeA(p,q)[X,X] t + o p (l). (A.18) 

It is easy to see that y/£A(p,q)([X,X]t — (X,X) t ) = o p (l). Hence I = o p (l) if we can show that 
( A7l8| > holds. 

We now verify (A.18). It is easy to see that the RHS of ( |A.17 ) equals 

yEE( c ») 2 + ^E E c ti c i+ ^E E AB * Ac h 

k=0 t k <t q fc=0t* <t q k=0tl <t 
"> .. ' s .. ' v .. ' 



i. a 



Li 

VEE Ct x AA fe + ^EE C k AA k +o p (l) 



I.iii 



k = o th<t 



I. IV 



I.v 



We analyze them one by one. 

We start with I.i = ^ ^21=1 ^2 t k <t(^i) 2 - Notice that on each sub-grid 



"i,0- 



P- 1 -2 



P-i /i-i 



j=2 \m=l 
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Therefore, term I.i can be rewritten as follows 

^ E(-,)-f (e (e p 4) ( e ;:| 



dominating term A 



^E EE ^« + ^ E 2 (e!e =^ I ax 



i=2 \j=2 ^ m=l P J 1 i= p -l \ j=2 P m=l P 

a fa L tq+p-t ( p-i • J-l 



+i 



E E ,E^-, + »« A^, (A.19) 
:= dominating term ^4 — edge term 1? + 

It is easy to see that the edge term B = o p (l). We shall further show that is negligible. To see 
that, notice that its expected predictable variation satisfies 

2 

^V^^EfE^E^.-,^.' 

H i \j=2 1 m=l 1 

= ^^E IE I E l v 3 —^ I AX wt 

C£p 3 



which follows from the fact that 



. 2 

p-2 I p-1 . . \ \ 3 

— . — j ] — m \ ^ v \ p° 

, m=l \ j=m+l 



£ | E ( E f Uu. uniformly in i 



Therefore, 



^ = P (J J^- 2 )=O p U = o p (l) in D[0, 1]. 



Next we estimate I.ii = Sfc=o <* ^i-i^i- ^ can be rearranged 



as 



o fa i+p~ 2 / i_ 9 • p- 1 



'■"-^ E E ^ E Tr AA ''.-.-i+™+i Ax ' 



p ^ — ' p 

i=q+l \j=l rn=l c 

o fi L «9+ 1 / p- 1 • p- 1 



U+i 



E E^E 



p L — ' p 

i=q+p— 1 \ j=l m=l 

^ L t qr+p-l / p-1 . p-1 
ZVt \ - / \ - J \ -> m 



+ ^ E E £ E -zr Ax ti- q - j+ m+i AX ti+1 . 



~L t q+2 \j=i-L t q " m=l ^ 
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(2) 

We denote the above quantity as S+ . Similar to the treatment for I.i, 



p— 1 . p— 1 
j x - m 



< 



C 



i \j=l P m=l P 



q2 n l 2q 



Therefore, 



s\ 2) = o p 



e P 3 



\-2ri a 2 



Or, 



3^—277 



o p (l) inD[0,l]. 



Now we study I. Hi = ^ T,l=o J2t k <t AS* AC* . Noticing that the estimate in {jA.4h holds 
uniformly for sub-grids Sk, hence by Cauchy- Schwartz inequality we obtain that 

I.iii = O p {[l/{qn- r ')) 112 ^ = op(l) in D[0, 1). 

Now we come to I.iv = ^ YXi Si fc „<i C i-i AA i- B y Cauchy-Schwartz inequality again, as the 
estimate in (A. 5) holds uniformly for sub-grids S^, we have 

I.iv = O p ([Ip/n 1 -^) 1 ' 2 ^ = op(l) in D[0, 1]. 
Finally we deal with I.v = YX=o £t* <t A A*- Similar to the decomposition flAl^ we have 

9-1 



y fc=0 



where, with AX^ := X tj? ,, t , - .Y, 

" J-l 



?2 



EE 

,fc <i j=2 
P 

EE 



AX 



J-l 



1 j_1 



m=2 



AX 



^ig+fe-f-J 



It is easy to see that 



fc=0 



3=1 * U<t 



dominating term B 



We next prove that 2y/Jjq Ylt=o ? 2 ^ s negligible. In fact, the estimate in (A. 7) holds uniformly for 
all the sub-grids, hence by Cauchy-Schwartz inequality again we get that 



9-1 



2V~e/qY J ^ = o p ({e P /n 



1-2»?\1/ 2 



Op(l) in £[0,1]. 



k=0 



Summing up the computations for I.i to I.v, we see that the two dominating terms appearing in 
I.i and I.v together give the first term in (A. 18) and the rest gives the o p (l) term in (A. 18). 
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Step 2 

Now we deal with the term 77, starting with 2y/I/q Y^k=oi^-i ^\t h - Denote 

A n X+. = x+. — x+. 

y L % v% v% — q 

Combining terms with common factor and ordering them chronologically (according to the se- 
quence (e u )i>i) we get 



2Vi 



q-l 



2\/Z 



fe=0 



Lj-l 



E( AA + fc — AA.fc ) £,k + AA.fc E-ik — AA.fc £,k 

\ l i,0 r »+l,0/ I i,0 L fc L fc i' °>° 

k=0 8=1 * ' * ' 

V E E i A <i X ti+i - \ x t i+q+] ] e u + remainder, 



<]P 



i=q+p \j=0 



where the remainder term is a sum similar as above over the i's smaller than q +p, and can be easily 
shown to be o p (l). We shall further show that the first summand is also negligible, as follows 



Var 



Mai 



r (L t -l)q+l fp-l 

~ X ' E E i A i X U +3 - ^ x U +q+] ] I eu 



qp 



i=q+p \j=0 
'p-l 



ft 



q 2 p 2 



E E fa**** ~ A 1 X ti 



+q+j J 



i \j=0 



q 2 p 2 



2 p- 1 



2 p- 1 

e 



E E ((vw 2 + (a,^ +9+j ) 2 ) - ^ EE a <a + a^ 



i j=Q 



i+q+j 



i j=0 



+ ^# E E lAr**™ " ^VW,] [ " ^ 

« 0<j<fc<p-l 



i + q+fc J 



(A.20) 

: = V l + V 2 + V3. (A.21) 

We have, firstly, by applying Lemma [2] and using the fact that E(A q Xt i ) 2 < Cq/n 1 ^ for all i, 

CI q CI 

n L p A n I 



qpn 



This, together with the Cauchy-Schwartz inequality, imply that < C£/(pqn v ) — > 0. Finally, 

using the Cauchy-Schwartz inequality again we have 

E \V 3 \ < CpE{y{) < Cl/(qn- tl ) -> 0. 
Second, for X]^Zg[e, e]f fe , following the way the terms of 2V^/gEf=o[ x > 

e] t k were rear- 
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ranged, we have 



q-l 

E^ 

k=0 



VI 

pq 



P-i 



E HE 



*i<* 



i=o 



P 



p— i . p— i 

v^p-J . -v^P-J 



i=i 



P 



P 



+ Vef.+Op(l) 

^ y ti<t 

where the o p (l) term is again due to the end effect. 

We first deal with We need the following notation 

Ji := {1,2,..., p- 1}, J 2 := {q-p+1, q-p + 2,..., q-l} and J 3 := {q,q+l,...,q + p-l}. 

Let J := Um=i ^ m an< ^ ^ max ^ e tne l ar g es t element in J. Moreover, denote the following weight 
function 

' 4^ for j€ Ji; 

-22=|±i for j G J 2 ; and 



p 



for j G J 3 . 



Notice that |u> (j)| < 4 for all j G J. is a martingale with quadratic variation that can then be 

represented as 



(M^,M^) t 



E 



**<* 



p— i . p— i 

3=0 F j=l 1 



P-I 



<3+i 



+<E 



i=i 



P_^J 
P 



-£ti- 



24&r| 



p-1 



EE(p- ^ ^SEE -(i) 2 (4_, - 



u<tj=i 



p^q 



U<tjeJ 



^ y ti<tj,keJ,j^k 
24£a 2 E ^ ., 2 2 



p 4 q 2 



U<tj=l 



pq 2 n C p J 



where the last line follows from the assumption that L t /£ — > f r s ds and the third equality is explained 
as follows. We take the third term on the RHS of the second equality for example while the second 
term can be treated more easily by a similar argument. Notice that this term can be rewritten as 



St. := 



2lal 



" — 2 2 

p^q L 



E 

U<t* 



E E w ( k ) w U + k )hk£J, j+keJ} £ U 

j=i V k=i J 
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where t* := max{tj < t}. Hence by Lemma[2j the BDG inequality, the boundedness of w( ) function 
and the fact that the cardinality of set J is of order p, we have 



t i <ct 



-/max~l / ^max j \ 

E E w ( k ) w U + k ) J {keJ, j+keJ} 

j=i V /c=i / 

•2 



P 



(1). 



Hence £,% = o p (l). 

Therefore, based on our moment assumption for (et-)i>i, M^ 4 ) satisfies a CLT where the lim- 
iting distribution is a mixture of normal and the mixture component is the variance equal to 
" Cl Jq r s ds; in other words, 



M 



(4) 



1/2 



where Bt is a standard Brownian motion that is independent of T\. 
As to 2^1/ {pq) ^ follows fr om Lemma [j]and C(4) that 



2y/l 
pq 

pq 

-Or, I 



E4 



2VI 

pq 



pq 



2Vi 



pq 

op(l) in£>[0,l]. 



N t [o*-o* e 



(A.22) 



Step 5 

Finally, we prove a CLT for term III. We have 
where 



fc=0 



fc=0 



dM t fe = 2v / ^(A i - X t k)dX t 
and i£ is the largest time smaller than or equal to t on the feth sub-grid. Therefore 

1 9-1 t* 2\TP q ~ l C l ft 

M t = - V / dM* = ~y~ V / (X s - X tk )dX s = 2VI / f n (s)dX s , 

q k=o Jo q k=o Jo Jo 



where 



0, for s £ [0,t p ); 



fn(s) 



\ Ej=oP^ - X ti-j), fo r s £ [ti,t i+1 ) and p < i < q + p; 
K X s - X u + Y%Zl ^ AA ti _ j+1 , for s S [U, t i+1 ) and i > q + p. 
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Mi is a martingale with quadratic variation (M, M)t = 4£ J* * f n (s) 2 a 2 ds. Since Ef n (s) 2 < Cq/n 1 v , 



q+p 



f n (s) 2 a 2 ds = o p (l), 

Hence, we need to only consider s > t q + p , i.e., i> q + p. By Ito's formula, 

dfn(s) 4 = 4f n (s) 3 dX s + 6f n (s) 2 cr 2 ds, for s G [U,t i+ i) and i > <? + p. 

Hence 



(A.23) 



{M,M) t = U / /„( S ) 2 <r 2 d S = - 



C?fn(s) 



/n(s) 3 ^ s 



(A.24) 



We first prove the second term on the RHS of (A.24) is negligible. In fact, by the BDG inequality, 

EUsf < C{q/n l -vf (A.25) 

uniformly in s. Hence 



El / UsfdxA <E( f n (sfdX s , / UsfdX, 



<E f f n (sfa 2 ds 
Jo 

<a\\ Ef n {sfds 
Jo 



< C 



l-r, 



■n 



ds = 



■n 



(A.26) 



and I Jq f n (s) 3 dX s — 0, in -D[0, 1], as n — > oo. Now we deal with the first term in (A.24). We shall 
only focus on the integral on [U q+ , t*] where t* is the largest t L < t; the remainder term is negligible. 
We then have 



7" 

Ju , 



3-1 



EE 

ti<t I \j=0 



-AX, 



E^ax«, 

i*<t t<<t Vj'=l q 



11 



9-1 



3-1 



vi=i 



vi=l 



i7i 



By the BDG inequality and Lemma[2| / = O p (l/n l 2r! ) = o p (l). Moreover, for term IV, by comput- 
ing its quadratic variation and using ( |A.25 ) we get 

TV = O p (£q 3 / 2 /n 3 / 2 - 2 ^) = O p (l/(Vln- 2 ^)) = o p (l). 
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As to term III, we treat it in the same fashion as above by defining for s 6 [ii— 



9-1 



X s := 4£(X S - Xt^f hr AXt 



J =1 



XW :=12/(*.-^_J a [£i-^AX. 



'9-1 



\U q 



Then 



III = J2 [ 1 dX s = [ XPdX s + [ X^a 2 s ds + o p {\), 
t<t Jt i - 1 Jo Jo 

which is again an o p (l) term by noting that (1) £(X S (1) ) 2 < C£ 2 • l/n 2 ~ 2,? • q/n 1 ^ < Cl/n 2 -^- 
and (2) £(X S (2) ) 2 < C£ 2 • ■ q/n 1 -"' < Cl/n 1 ^. Finally, by assumption C(6) we get the 

convergence of term and hence 



(M, M) t 4 I w s a A s ds for all t. 
Jo 



Next, we estimate the quadratic covariation between M and X. To do so, we first notice that, 
by Ito's formula, 



d(X, M)t = ~ E d(X, M k ) t = -J2 2 ~^d{X t -X^f- 1 -^ 2Vl(X t - X t , 



k=0 



q^ 3 

H k=0 



l dX, 



(A.27) 



k=0 



where 



5-1 



7X 2 -^d(x t -x tk f= 2 -l^Vid[x,x, x\f» . 



k=0 



3q 



k=0 



We next show that the martingale term in (A.27) is negligible. Rearranging terms the same way as 
we did for M t , we have 



9-1 



2Vl 



g n (s)dX s , 



where 



R t := f - J2 2Vi(X s - X tk fdX s 
Jo q k=o 

0, for s £ [0,t p ); 
9n{s) = < Yj) r =oi X s ~ x U-j) 2 , for s £ [U, t i+ i) and p < i < q + p; 
J2 q j=o( X s ~ Xti-j) 2 , for s S [ti,t i+ x) and i > q + p. 
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Observe that by the Cauchy-Schwartz inequality and the BDG inequality, 

2 

"1, — 1 J 



0<j<k<q-l 
n 2 

O 







E 












= E 






3=0 






^ Ca +1 n 2-2 V 



2-2jj 



n 



n 



Hence, uniformly in s G and i, Eg n (s) 2 < Cq 4 /n 2 2ri . Therefore, by the BDG inequality 

again, 

i? /■*..„ „ /a 4 ^1 

< C- ^r^O, 



^(^) 2 < C^E I g n (s) 2 a 2 s ds < C 
Q Jo 



■n 



2-2r)g2 — £ n ~ 2r l 



and hence Rt = o p {\). Therefore, Assumption C(7) and (A. 27) imply that 

2 * 



(X,M) t 



3 , ».»s 



VcOeds for all t. 



It follows from the limit results in either Theorem B.4 (p. 65-67) of Zhang (2001) or Theorem 



2.28 of |Mykland and Zhang (20 12^ that, stably in law, 

2 " 



M t => - v s a s dX s + 
J o 



4w s - -v? I a 4 



1/2 



(A.28) 



where Bt is a standard Brownian motion that is independent of F\. 



Step 



Clearly (M 4 ,M) t = 0. The overall results then follows from (A.22) and (A.28). 



References 

Ai't-Sahalia, Y., and Kimmel, R. (2007) Maximum Likelihood Estimation of Stochastic Volatility Models. 
Journal of Financial Economics, 83 413-452. 

Ai't-Sahalia, Y., Fan, J., and Li, Y. (2012) The Leverage Effect Puzzle: Disentangling Sources of Bias at High 

Frequency. Journal of Financial Economics, forthcoming. 
Ai't-Sahalia, Y., and Mykland, P. A. (2003) The Effects of Random and Discrete Sampling When Estimating 

Continuous Time Diffusions. Econometrica, 71 483-549. 

Ai't-Sahalia, Y., Mykland, P. A., and Zhang, L. (2005) How Often to Sample a Continuous-Time Process in 
the Presence of Market Microstructure Noise. Review of Financial Studies, 18 351-416. 

Bandi, F. M., Russell, J. R., 2006. Separating microstructure noise from volatility. Journal of Financial 
Economics, 79 655-692. 

Barndorff-Nielsen, O.E., Hansen, P.R., Lunde, A., and Shephard, N. (2008) Designing Realized Kernels to 
Measure the ex post Variation of Equity Prices in the Presence of Noise. Econometrica, 76 1481-1536. 



30 



Barndorff-Nielsen, O.E., and Shephard, N. (2002) Econometric Analysis of Realized Volatility and Its Use in 
Estimating Stochastic Volatility Models. J. Roy. Statist. Soc. Ser. B, 64 253-280. 

DufBc, D., and Glynn, P. (2004) Estimation of Continuous- Time Markov Processes Sampled at Random 

Times. Econometrica, 72 1773-1808. 
Fan, J., Wang, Y., 2007. Multi-scale jump and volatility analysis for high-frequency financial data. Journal of 

the American Statistical Association, 102 1349-1362. 

Fukasawa, M. (2010a) Central limit theorem for the realized volatility based on tick time sampling, Finance 
and Stochastics, 14 (2010), 209-233. 

Fukasawa, M. (2010b) Realized volatility with stochastic sampling, Stochastic Processes and Their Applica- 
tions, 120, 829-552. 

Fukasawa, M., and Rosenbaum, M. (2012) Central Limit Theorems for Realized Volatility under Hitting Times 
of an Irregular Grid. Stochastic Processes and Their Applications, 122 3901-3920. 

Hansen, P. R., and Lunde, A. (2004a) Realized Variance and IID Market Microstructure Noise. Technical 
report, Brown University, Dept. of Economics. 

Hayashi, T., Jacod, J., and Yoshida, N. (2011) Irregular Sampling and Centeral Limit Theorems for Power 
Variations: The Continuous Case. Ann. Inst. H. Poincar Probab. Statist, 47 1197-1218. 

Jacod, J., and Protter, P. (1998) Asymptotic Error Distributions for the Euler Method for Stochastic Differ- 
ential Equations. The Annals of Probability, 26, 267-307. 

Jacod, J., Li, Y., Mykland, P. A., Prodolskij, M., and Vetter, M. (2009) Microstructure Noise in the Continuous 
case: The Pre-averaging Approach. Stochastic Processes and Their Applications, 119 2249-2276. 

Karatzas, I., and Shreve, S. E. (1991) Brownian Motion and Stochastic Calculus. Springer. 

Kalnina, I., Linton, O., 2008. Estimating quadratic variation consistently in the presence of endogenous and 

diurnal measurement error. Journal of Econometrics, 147 47-59. 
Li, Y., Mykland, P. A., 2007. Are volatility estimators robust with respect to modeling assumptions? Bernoulli, 

13 601-622. 

Li, Y., Renault, E., Mykland, P. A., Zhang, L., and Zheng, X. (2009) Realized Volatility When Sampling 
Times are Possibly Endogenous. Manuscript. Available at SSRN: http://ssrn.com/abstract=1525410 

Li, Y., Zhang, Z., and Zheng, X. (2013) Supplement to "Volatility Inference in The Presence of Both Endoge- 
nous Time and Microstructure Noise" . 

Meddahi, N., Renault, E., and Werker, B. (2006) GARCH and Irregularly Spaced Data. Economics Letters, 
90 200-204. 

Mykland, P. A. and Zhang, L. (2006), ANOVA for Diffusions and Ito Processes, Annals of Statistics, 34, 
1931-1963. 

Mykland, P. A., and Zhang, L. (2009) Inference for Continuous Semimartingales Observed at High Frequency. 

Econometrica, 77 1403-1455. 
Mykland, P.A., and Zhang, L. (2012) The Econometrics of High Frequency Data, (to appear in Statistical 

Methods for Stochastic Differential Equations, M. Kessler, A. Lindner, and M. S0rensen, eds., Chapman 

and Hall/CRC Press), p. 109-190. 
Phillips, P. C. B., and Yu, J. (2007) Information Loss in Volatility Measurement with Flat Price Trading. 

working paper. 

Podolskij, M., and Vetter, M. (2009) Estimation of Volatility Functionals in the Simultaneous Presence of 
Microstructure Noise and Jumps. Bernoulli, 15 634-658. 

Renault, E., and Werker, B. J. (2011) Causality Effects in Return Volatility Measures With Random Times. 
Journal of Econometrics, 160 272-279. 



31 



Revuz, D., and Yor, M. (1999) Continuous Martingales and Brownian Motion. Springer- Verlag, Berlin. 
Robert, C.Y., and Roscnbaum, M. (2012) Volatility and Covariation Estimation When Microstructure Noise 
and Trading Times are Endogenous. Mathematical Finance, 22 133-164. 

Xiu, D. (2010) Quasi-Maximum Likellihood Estimation of Volatility with High Frequency Data. Journal of 
Econometrics, 159 235-250. 

Zhang, L., Mykland, P. A., and Ait-Sahalia, Y. (2005) A Tale of Two Time Scales: Determining Integrated 
Volatility with Noisy High-Frequency Data. Journal of the American Statistical Association, 100 1394- 
1411. 

Zhang, L. (2001), From Martingales to ANOVA: Implied and Realized Volatility, Ph.D. thesis, The University 
of Chicago, Department of Statistics. 

Zhang, L. (2006) Efficient Estimation of Stochastic Volatility Using Noisy Observations: A Multi-Scale 
Approach. Bernoulli, 12 1019-1043. 



32 



